Semi-supervised atmospheric component learning in low-light image problem

Ambient lighting conditions play a crucial role in determining the perceptual quality of images from photographic devices. In general, inadequate transmission light and undesired atmospheric conditions jointly degrade the image quality. If we know the desired ambient factors associated with the given low-light image, we can recover the enhanced image easily. Typical deep networks perform enhancement mappings without investigating the light distribution and color formulation properties. This leads to a lack of image instance-adaptive performance in practice. On the other hand, physical model-driven schemes suffer from the need for inherent decompositions and multiple objective minimizations. Moreover, the above approaches are rarely data efficient or free of postprediction tuning. Influenced by the above issues, this study presents a semisupervised training method using no-reference image quality metrics for low-light image restoration. We incorporate the classical haze distribution model to explore the physical properties of the given image to learn the effect of atmospheric components and minimize a single objective for restoration. We validate the performance of our network for six widely used low-light datasets. Experimental studies show that our proposed study achieves a competitive performance for no-reference metrics compared to current state-of-the-art methods. We also show the improved generalization performance of our proposed method which is efficient in preserving face identities in extreme low-light scenarios.


Reviewer #1
1.1. It should be explicitly noted in the manuscript that a lower NIQE score indicates better perceptual quality.
In 3.1, we added that NIQUE quantifies the deviation from the image quality. We hope we made the paper more clear. We also noted in the evaluation section that a lower value is better for all three evaluation metrics.
1.2. This paper randomly selects 500 images from the GLADNet dataset to start the model training and then adopt the proposed semi-supervised learning scheme to train the model. The natural question is why not directly use all 5000 images with ground truth to train the model using standard supervised learning? What's the advantage of the proposed semi-supervised learning over supervised learning?
We agree and understand that using more data with ground-truth will almost always produce a better result. The advantage of semi-supervised learning is that it requires fewer number of images with ground-truths compared to direct supervised learning. And ground-truths are not always easy to obtain. Thus, we hoped that one of the contribution is to show that even with smaller number of ground-truths, we can produce state-of-the-arts results. Essentially, however, the difference between using 500 and 5000 ground-truths were sufficiently small enough to give a merit to semi-supervised learning, where we can make an argument for an additional contribution.
1.3. A mixed loss function is adopted to train the model. There should be an ablation study to verify the contribution of each single loss term. Fig. 3 shows differences of SSIM and PSNR for each single loss term as they are being trained. Instead of using no-reference metrics, we did our ablation test using SSIM and PSNR from ground-truths. Interestingly enough, a single SSIM loss does not produce highest SSIM score in the ablation test images. This is probably due to lack of sufficient diversity or number of training data, which our approach can compensate through the physical model and mixture of different loss functions such as smoothness and brightness functions. Fig. 4 also shows qualitative assessment of different weight values in the mixed loss functions. We understand that more can be done to make our ablation study more complete, however, we hope the study presented here are of sufficient interest for publication.
1.4. In addition to the adopted performance measure, the authors are recommended to also use UNIQUE Zhang et al. [2021], a SOTA NR-IQA metric, to validate the performance of competing methods.
Thank you for referring to UNIQUE Zhang et al. [2021], a good study on blind image quality assessment. In our study, we have cited this paper. At present, unfortunately, we are not able to include a full-fledged comparison of the UNIQUE metric for all of the datasets and compared studies. We have observed UNIQUE's wide acceptance in the BIQA studies, but for image restoration studies, this metric is comparatively new to the community. In our future study, we will investigate the prominence of the UNIQUE metric over restorations task from different domains and aim to present a versatile assessment. In the meantime, we hope our current manuscript with three different blind evaluation metrics is sufficient to present the merit of our study.

Reviewer #2
2.1. The specific structure of the network model is not described in detail. The physical model is from previous work LIME Guo et al. [2017], which is slightly less innovative on network model structure.
We use DnCNN Zhang et al. [2017] for our network. We have updated the Fig.2 and the manuscript to describe the model. However, unlike many of the previous studies, we want to emphasize that structure of the network is not important to the proposed approach. In Fig. 2, our feature map aggregator is an universal block, which can be any network with a slight modification (One input, two outputs instead of one , etc. can be used as the core feature extractor, which learns the approximate features for A(x) and t(x) to solve equation (4) for low-light enhancement.